情感计算在人与机器之间的关系中非常重要。在本文中,提出了一种基于语音信号的语音情感识别(SER)的系统,其利用不同处理阶段的新技术。该系统由三个阶段组成:功能提取,功能选择,最终要素分类。在第一阶段,使用新的和多样性特征(如韵律,光谱和频谱)特征,从语音信号和光泽 - 波形信号中提取复杂的长期统计特征。 SER系统的挑战之一是区分相关情绪。这些特征是言语情绪的好鉴别者,并提高Ser识别类似和不同情绪的能力。此特征向量具有大量维度自然具有冗余。在第二阶段,使用经典特征选择技术以及用于减少特征向量维度的新量子启发技术,减少了特征向量尺寸的数量。在第三阶段,优化的特征向量由加权深稀疏的极端学习机(ELM)分类器分类。分类器以三个步骤执行分类:稀疏随机特征学习,使用奇异值分解(SVD)技术的正交随机投影,以及使用广义Tikhonov正规技术的最后一步中的鉴别分类。此外,许多现有的情绪数据集遭受数据不平衡分布的问题,这反过来增加了分类误差并降低了系统性能。在本文中,还提出了一种新的加权方法来处理类别不平衡,比现有的加权方法更有效。所提出的方法是在三个标准情绪数据库上进行评估。
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
This paper deals with the problem of statistical and system heterogeneity in a cross-silo Federated Learning (FL) framework where there exist a limited number of Consumer Internet of Things (CIoT) devices in a smart building. We propose a novel Graph Signal Processing (GSP)-inspired aggregation rule based on graph filtering dubbed ``G-Fedfilt''. The proposed aggregator enables a structured flow of information based on the graph's topology. This behavior allows capturing the interconnection of CIoT devices and training domain-specific models. The embedded graph filter is equipped with a tunable parameter which enables a continuous trade-off between domain-agnostic and domain-specific FL. In the case of domain-agnostic, it forces G-Fedfilt to act similar to the conventional Federated Averaging (FedAvg) aggregation rule. The proposed G-Fedfilt also enables an intrinsic smooth clustering based on the graph connectivity without explicitly specified which further boosts the personalization of the models in the framework. In addition, the proposed scheme enjoys a communication-efficient time-scheduling to alleviate the system heterogeneity. This is accomplished by adaptively adjusting the amount of training data samples and sparsity of the models' gradients to reduce communication desynchronization and latency. Simulation results show that the proposed G-Fedfilt achieves up to $3.99\% $ better classification accuracy than the conventional FedAvg when concerning model personalization on the statistically heterogeneous local datasets, while it is capable of yielding up to $2.41\%$ higher accuracy than FedAvg in the case of testing the generalization of the models.
translated by 谷歌翻译
Solute transport in porous media is relevant to a wide range of applications in hydrogeology, geothermal energy, underground CO2 storage, and a variety of chemical engineering systems. Due to the complexity of solute transport in heterogeneous porous media, traditional solvers require high resolution meshing and are therefore expensive computationally. This study explores the application of a mesh-free method based on deep learning to accelerate the simulation of solute transport. We employ Physics-informed Neural Networks (PiNN) to solve solute transport problems in homogeneous and heterogeneous porous media governed by the advection-dispersion equation. Unlike traditional neural networks that learn from large training datasets, PiNNs only leverage the strong form mathematical models to simultaneously solve for multiple dependent or independent field variables (e.g., pressure and solute concentration fields). In this study, we construct PiNN using a periodic activation function to better represent the complex physical signals (i.e., pressure) and their derivatives (i.e., velocity). Several case studies are designed with the intention of investigating the proposed PiNN's capability to handle different degrees of complexity. A manual hyperparameter tuning method is used to find the best PiNN architecture for each test case. Point-wise error and mean square error (MSE) measures are employed to assess the performance of PiNNs' predictions against the ground truth solutions obtained analytically or numerically using the finite element method. Our findings show that the predictions of PiNN are in good agreement with the ground truth solutions while reducing computational complexity and cost by, at least, three orders of magnitude.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
The JPEG standard is widely used in different image processing applications. One of the main components of the JPEG standard is the quantisation table (QT) since it plays a vital role in the image properties such as image quality and file size. In recent years, several efforts based on population-based metaheuristic (PBMH) algorithms have been performed to find the proper QT(s) for a specific image, although they do not take into consideration the user's opinion. Take an android developer as an example, who prefers a small-size image, while the optimisation process results in a high-quality image, leading to a huge file size. Another pitfall of the current works is a lack of comprehensive coverage, meaning that the QT(s) can not provide all possible combinations of file size and quality. Therefore, this paper aims to propose three distinct contributions. First, to include the user's opinion in the compression process, the file size of the output image can be controlled by a user in advance. Second, to tackle the lack of comprehensive coverage, we suggest a novel representation. Our proposed representation can not only provide more comprehensive coverage but also find the proper value for the quality factor for a specific image without any background knowledge. Both changes in representation and objective function are independent of the search strategies and can be used with any type of population-based metaheuristic (PBMH) algorithm. Therefore, as the third contribution, we also provide a comprehensive benchmark on 22 state-of-the-art and recently-introduced PBMH algorithms on our new formulation of JPEG image compression. Our extensive experiments on different benchmark images and in terms of different criteria show that our novel formulation for JPEG image compression can work effectively.
translated by 谷歌翻译
This paper presents a Temporal Graph Neural Network (TGNN) framework for detection and localization of false data injection and ramp attacks on the system state in smart grids. Capturing the topological information of the system through the GNN framework along with the state measurements can improve the performance of the detection mechanism. The problem is formulated as a classification problem through a GNN with message passing mechanism to identify abnormal measurements. The residual block used in the aggregation process of message passing and the gated recurrent unit can lead to improved computational time and performance. The performance of the proposed model has been evaluated through extensive simulations of power system states and attack scenarios showing promising performance. The sensitivity of the model to intensity and location of the attacks and model's detection delay versus detection accuracy have also been evaluated.
translated by 谷歌翻译
In this work, we propose a communication-efficient two-layer federated learning algorithm for distributed setups including a core server and multiple edge servers with clusters of devices. Assuming different learning tasks, clusters with a same task collaborate. To implement the algorithm over wireless links, we propose a scalable clustered over-the-air aggregation scheme for the uplink with a bandwidth-limited broadcast scheme for the downlink that requires only two single resource blocks for each algorithm iteration, independent of the number of edge servers and devices. This setup is faced with interference of devices in the uplink and interference of edge servers in the downlink that are to be modeled rigorously. We first develop a spatial model for the setup by modeling devices as a Poisson cluster process over the edge servers and quantify uplink and downlink error terms due to the interference. Accordingly, we present a comprehensive mathematical approach to derive the convergence bound for the proposed algorithm including any number of collaborating clusters in the setup and provide important special cases and design remarks. Finally, we show that despite the interference in the proposed uplink and downlink schemes, the proposed algorithm achieves high learning accuracy for a variety of parameters.
translated by 谷歌翻译
The DETR object detection approach applies the transformer encoder and decoder architecture to detect objects and achieves promising performance. In this paper, we present a simple approach to address the main problem of DETR, the slow convergence, by using representation learning technique. In this approach, we detect an object bounding box as a pair of keypoints, the top-left corner and the center, using two decoders. By detecting objects as paired keypoints, the model builds up a joint classification and pair association on the output queries from two decoders. For the pair association we propose utilizing contrastive self-supervised learning algorithm without requiring specialized architecture. Experimental results on MS COCO dataset show that Pair DETR can converge at least 10x faster than original DETR and 1.5x faster than Conditional DETR during training, while having consistently higher Average Precision scores.
translated by 谷歌翻译
基于各种非负矩阵分解(NMF)方法为成本函数添加了新术语,以使模型适应特定任务,例如聚类或保留减少空间中的某些结构属性(例如,局部不变性)。附加的术语主要由高参数加权,以控制整体公式的平衡,以指导优化过程实现目标。结果是一种参数化的NMF方法。但是,NMF方法采用了无监督的方法来估计分解矩阵。因此,不能保证使用新的特征执行预测(例如分类)的能力。这项工作的目的是设计一个进化框架,以学习参数化NMF的超参数,并以监督的方式估算分解矩阵,以更适合分类问题。此外,我们声称,将基于NMF的算法分别应用于不同的类对,而不是将其应用于整个数据集,从而提高了矩阵分解过程的有效性。这导致训练具有不同平衡参数值的多个参数化的NMF算法。采用了交叉验证组合学习框架,并使用遗传算法来识别最佳参数值集。我们对真实和合成数据集进行的实验证明了所提出的方法的有效性。
translated by 谷歌翻译